skip to main content


Search for: All records

Creators/Authors contains: "Wang, Tianhao"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Proper communication is key to the adoption and implementation of differential privacy (DP). In this work, we designed explanative illustrations of three DP models (Central DP, Local DP, Shuffler DP) to help laypeople conceptualize how random noise is added to protect individuals’ privacy and preserve group utility. Following a pilot survey and an interview, we conducted an online experiment ( N = 300) exploring participants’ comprehension, privacy and utility perception, and data-sharing decisions across the three DP models. We obtained empirical evidence showing participants’ acceptance of the Shuffler DP model for data privacy protection. We discuss the implications of our findings.

     
    more » « less
    Free, publicly-accessible full text available September 1, 2024
  2. We present PROVNINJA, a framework designed to generate adversarial attacks that aim to elude provenance-based Machine Learning (ML) security detectors. PROVNINJA is designed to identify and craft adversarial attack vectors that statistically mimic and impersonate system programs. Leveraging the benign execution profile of system processes commonly observed across a multitude of hosts and networks, our research proposes an efficient and effective method to probe evasive alternatives and devise stealthy attack vectors that are difficult to distinguish from benign system behaviors. PROVNINJA's suggestions for evasive attacks, originally derived in the feature space, are then translated into system actions, leading to the realization of actual evasive attack sequences in the problem space. When evaluated against State-of-The-Art (SOTA) detector models using two realistic Advanced Persistent Threat (APT) scenarios and a large collection of fileless malware samples, PROVNINJA could generate and realize evasive attack variants, reducing the detection rates by up to 59%. We also assessed PROVNINJA under varying assumptions on adversaries' knowledge and capabilities. While PROVNINJA primarily considers the black-box model, we also explored two contrasting threat models that consider blind and whitebox attack scenarios. 
    more » « less
    Free, publicly-accessible full text available August 31, 2024
  3. Publishing trajectory data (individual’s movement information) is very useful, but it also raises privacy concerns. To handle the privacy concern, in this paper, we apply differential privacy, the standard technique for data privacy, together with Markov chain model, to generate synthetic trajectories. We notice that existing studies all use Markov chain model and thus propose a framework to analyze the usage of the Markov chain model in this problem. Based on the analysis, we come up with an effective algorithm PrivTrace that uses the first-order and second-order Markov model adaptively. We evaluate PrivTrace and existing methods on synthetic and real-world datasets to demonstrate the superiority of our method. 
    more » « less
    Free, publicly-accessible full text available August 9, 2024
  4. Privacy and Byzantine resilience are two indispensable requirements for a federated learning (FL) system. Although there have been extensive studies on privacy and Byzantine security in their own track, solutions that consider both remain sparse. This is due to difficulties in reconciling privacy-preserving and Byzantine-resilient algorithms.

    In this work, we propose a solution to such a two-fold issue. We use our version of differentially private stochastic gradient descent (DP-SGD) algorithm to preserve privacy and then apply our Byzantine-resilient algorithms. We note that while existing works follow this general approach, an in-depth analysis on the interplay between DP and Byzantine resilience has been ignored, leading to unsatisfactory performance. Specifically, for the random noise introduced by DP, previous works strive to reduce its seemingly detrimental impact on the Byzantine aggregation. In contrast, we leverage the random noise to construct a first-stage aggregation that effectively rejects many existing Byzantine attacks. Moreover, based on another property of our DP variant, we form a second-stage aggregation which provides a final sound filtering. Our protocol follows the principle of co-designing both DP and Byzantine resilience.

    We provide both theoretical proof and empirical experiments to show our protocol is effective: retaining high accuracy while preserving the DP guarantee and Byzantine resilience. Compared with the previous work, our protocol 1) achieves significantly higher accuracy even in a high privacy regime; 2) works well even when up to 90% distributive workers are Byzantine. 

    more » « less
    Free, publicly-accessible full text available June 13, 2024
  5. In many applications, multiple parties have private data regarding the same set of users but on disjoint sets of attributes, and a server wants to leverage the data to train a model. To enable model learning while protecting the privacy of the data subjects, we need vertical federated learning (VFL) techniques, where the data parties share only information for training the model, instead of the private data. However, it is challenging to ensure that the shared information maintains privacy while learning accurate models. To the best of our knowledge, the algorithm proposed in this paper is the first practical solution for differentially private vertical federatedk-means clustering, where the server can obtain a set of global centers with a provable differential privacy guarantee. Our algorithm assumes an untrusted central server that aggregates differentially private local centers and membership encodings from local data parties. It builds a weighted grid as the synopsis of the global dataset based on the received information. Final centers are generated by running anyk-means algorithm on the weighted grid. Our approach for grid weight estimation uses a novel, light-weight, and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin sketch. To improve the estimation accuracy in the setting with more than two data parties, we further propose a refined version of the weights estimation algorithm and a parameter tuning strategy to reduce the finalk-means loss to be close to that in the central private setting. We provide theoretical utility analysis and experimental evaluation results for the cluster centers computed by our algorithm and show that our approach performs better both theoretically and empirically than the two baselines based on existing techniques 
    more » « less
  6. There is great demand for scalable, secure, and efficient privacy-preserving machine learning models that can be trained over distributed data. While deep learning models typically achieve the best results in a centralized non-secure setting, different models can excel when privacy and communication constraints are imposed. Instead, tree-based approaches such as XGBoost have attracted much attention for their high performance and ease of use; in particular, they often achieve state-of-the-art results on tabular data. Consequently, several recent works have focused on translating Gradient Boosted Decision Tree (GBDT) models like XGBoost into federated settings, via cryptographic mechanisms such as Homomorphic Encryption (HE) and Secure Multi-Party Computation (MPC). However, these do not always provide formal privacy guarantees, or consider the full range of hyperparameters and implementation settings. In this work, we implement the GBDT model under Differential Privacy (DP). We propose a general framework that captures and extends existing approaches for differentially private decision trees. Our framework of methods is tailored to the federated setting, and we show that with a careful choice of techniques it is possible to achieve very high utility while maintaining strong levels of privacy. 
    more » « less